A Prefetching Technique for Irregular Accesses to Linked Data Structures
نویسندگان
چکیده
Prefetching offers the potential to improve the performance of linked data structure (LDS) traversals. However, previously proposed prefetching methods only work well when there is enough work processing a node that the prefetch latency can be hidden, or when the LDS is long enough and the traversal path is known a priori. This paper presents a prefetching technique called prefetch arrays which can prefetch both short LDS, as the lists found in hash tables, and trees when the traversal path is not known a priori. We offer two implementations, one software-only and one which combines software annotations with a prefetch engine in hardware. On a pointer-intensive benchmark suite, we show that our implementations reduce the memory stall time by 23% to 51% for the kernels with linked lists, while the other prefetching methods cause reductions that are substantially less. For binary-trees, our hardware method manages to cut nearly 60% of the memory stall time even when the traversal path is not known a priori. However, when the branching factor of the tree is too high, our technique does not improve performance. Another contribution of the paper is that we quantify pointer-chasing found in interesting applications such as OLTP, Expert Systems, DSS, and JAVA codes and discuss which prefetching techniques are relevant to use in each case.
منابع مشابه
Dependence Based Prefetching for Linked Data Structures
We introduce a dynamic scheme that captures the access patterns of linked data structures and can be used to predict future accesses with high accuracy. Our technique exploits the dependence relationships that exist between loads that produce addresses and loads that consume these addresses. By identifying producer-consumer pairs, we construct a compact internal representation for the associate...
متن کاملMemory-Side Prefetching for Linked Data Structures
This work studies a memory-side prefetching technique to hide latency incurred by inherently serial accesses to linked data structures (LDS). A programmable prefetch engine sits close to memory and traverses LDS independently from the processor. The prefetch engine can run ahead of the processor because of its low latency, high bandwidth path to memory. This allows the prefetch engine to initia...
متن کاملRecurrence analysis for effective array prefetching in Java
Java is an attractive choice for numerical, as well as other, algorithms due to the software engineering benefits of object-oriented programming. Because numerical programs often use large arrays that do not fit in the cache, they to suffer from poor memory performance. To hide memory latency, we describe a new unified compile-time analysis for software prefetching arrays and linked structures ...
متن کاملPage Rank Prefetching for Optimzing Accesses to Web Page Clusters
This paper presents a Page Rank based prefetching technique for accesses to web page clusters. The approach uses the link structure of a requested page to determine the “most important” linked pages and to identify the page(s) to be prefetched. The underlying premise of our approach is that in the case of cluster accesses, the next pages requested by users of the web server are typically based ...
متن کاملProgramming Large Dynamic Data Structures on a DSM Cluster of Multicores∗
Applications in increasingly important domains such as data mining and graph analysis operate on very large, dynamically constructed graphs, i.e. they are composed of dynamically allocated objects linked together via pointers. Parallel algorithms on large graphs can greatly benefit from software Distributed Shared Memory’s (DSM) convenience of sharedmemory programming and computational scalabil...
متن کامل